# Efficient pre-training

Olmo2 11B SuperBPE T180k
Apache-2.0
An 11-billion parameter large language model trained with the innovative SuperBPE tokenizer, supporting superword unit recognition and subword tokenization capabilities.
Large Language Model Transformers English
O
UW
29
2
Gte Multilingual Mlm Base
Apache-2.0
mGTE series multilingual text encoder, supporting 75 languages, with a maximum context length of 8192, based on BERT+RoPE+GLU architecture, excelling in GLUE and XTREME-R benchmarks
Large Language Model Safetensors
G
Alibaba-NLP
342
12
Ltg Bert Babylm
A BERT variant trained on the 100MW BabyLM Challenge dataset, optimized for performance on medium-scale corpora
Large Language Model Transformers English
L
ltg
594
2
M2 Bert 80M 2k Retrieval
Apache-2.0
This is an 80M parameter M2-BERT pre-trained checkpoint with a sequence length of 2048, fine-tuned for long-context retrieval tasks.
Text Embedding Transformers English
M
togethercomputer
538
15
Retromae Small Cs
A BERT-small model pre-trained on Czech web corpora using the RetroMAE objective, developed by Seznam.cz, suitable for various natural language processing tasks.
Text Embedding Transformers Other
R
Seznam
7,759
5
Sheared LLaMA 1.3B
Apache-2.0
Sheared-LLaMA-1.3B is an efficient language model obtained through structured pruning and continual pre-training based on LLaMA-2-7B
Large Language Model Transformers
S
princeton-nlp
11.09k
94
Efficient Mlm M0.15
This model investigates the effectiveness of masking 15% of content in masked language modeling, employing a pre-layer normalization approach.
Large Language Model Transformers
E
princeton-nlp
116
1
Distilbert Mlm 750k
DistilBERT is a lightweight distilled version of BERT, retaining most of the performance but with fewer parameters.
Large Language Model Transformers
D
vocab-transformers
26
0
Rugpt3small Based On Gpt2
Russian pre-trained Transformer language model developed by SberDevices team, based on GPT2 architecture, supports 1024 sequence length, trained on 80 billion tokens.
Large Language Model Other
R
ai-forever
46.92k
42
Roberta Base Wechsel Swahili
MIT
A RoBERTa base model trained using the WECHSEL method, specifically optimized for Swahili to achieve efficient cross-lingual transfer.
Large Language Model Transformers Other
R
benjamin
222
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase